No. 394 
1955 


A Study of Industrial, | 
Method of Paired | 


Martha Littleton Kelly: 


Aurora College, Minois 


: 
American Psyehdle gicel | 


Psychological Monographs: 
Géheral and Applied | 


Monograph« and the Archives of Psychology 
Wh the Monographs 


Editor 


Depmiment of Health, and Welfare 
abe 


Office of Bilucation 
Washington 25, D.C. 


Managing Editor 
Lorraine 


Consulting Editors 

Donap E, Baten E. Jonrs 
Frank A. W. MacKinnom 
Rosert G, Lorrin A. Ricés 
A, R. Rocers 
Haroip E, SauL Rosenzweic 
Jerry W. Caurnr, Jr. Ross STAGNER 

Crypt H. Coomans PERCIVAL M. SYMONDS 
Joun F, Joseru Tirrin 
HANSMANN Lepyvarp R Tucker 
Epona Josern Zunin 


Manusenirts should be Werit to the 
Because of lack of space, the Paychological Monographs can print only the original 
or advanced contribution @f the author. Background and bibliographic materials 
| must, tn general, be totally or kept an irreducible minimum, Statistical 
_tables should be used 86 pretent only the mo. important of the statistical data or 
evidence. 

first page of should contain the title of the paper, the author's 
name, and his instituthénal eonnectic:, (or his city of residence), Acknowledgments 
should be kept brief, and Ate 28 8 footnote on the first page. No table of contents 
need be included. For other directions 0+ suggestions on the preparation of manu- 
seripts, see: Conran, Preparation manuscripts for publication as mono- 
J. Psychol., 1948, 96, 447-459. 

‘Corresponpence BUSINESS MATTERS (such as author's fees, subscriptions 
and sales, change of Mame etc.) should be addressed to the American Psychologica] 
Amociation, Inc., St. N.W., Woshinccon D.C, Address changes 
arrive by the Month to take icllowing month. Undelivered 
copies resulting from addres changes will not be 1: placed: subscribers should notify 
the post office that they will Geetantee third-class forwarding postage. 


Copryaient, 1056; BY THE AMERICAN Poy ASSOCIATION, INC, 


‘Combining the 4 
3 
; 
» 
4 
Wa 


Vol. 69, No. 9 


Whole No. 394, 1955 


Psychological Monographs: General and Applied 


A Study of Industrial Inspection by the Method of 
Paired Comparisons’ 


Martha Littleton Kelly 


Aurora College, Aurora, Illinois 


NSPECTION is an indispensable part of 
I our industrial production system. 
Every manufactured product is exam- 
ined for quality at least once—if only by 
the customer prior to purchase. Most in- 
dustries, aware of the necessity for main- 
taining the quality of the goods they 
offer in a competitive market, subject 
their product to several inspections dur- 
ing manufacture. Few companies, how- 
ever, have devoted to inspection a frac- 
tion of the attention given to improving 
the product or production process. They 
seem to consider inspection a necessary 
evil, an unproductive element of manu- 
facturing cost which should be held to 
a minimum. 

Statistical quality control probably 
owes much of its growing popularity to 
the promise it offers for reducing inspec- 
tion costs. It concentrates on reducing 
the volume of product inspected and 
usually accepts the existing method; its 
formulas are based on the somewhat du- 
bious assumption that any individual 
inspector's work will be 100% accurate 


(7). 


* Based on a thesis submitted to the faculty of 
Purdue University in partial fulfillment of the 
requirements for the degree of Doctor of Phi- 
losophy, May, 1953. The writer wishes to express 
her appreciation to Dr. Joseph Tiffin, chairman, 
and the other members of her committee for 
their interest and guidance, and to the manage- 
ment of Corning Glass Works, Electrical Products 
Division, who supplied the material and job 
information on which this study was based. 


Basically, the inspector must examine 
a unit of product and then decide 
whether it is as good as the quality 
specified for it. Psychological techniques 
should be particularly appropriate to in- 
vestigation of the perception and judg- 
ment factors involved. But psychologists 
have been even less aware of the possi- 
bility of improving the inspection proc- 
ess than have management and produc- 
tion engineers, if the fact that only 16 
studies on inspection appeared in psy- 
chological journals over a_ ten-year 
period is any indication of their interest. 
A majority of the published studies (2, 
3) 4>. 5 8, 10, 14, 15, 16, 17, 18, 19) have 
investigated the relationship between 
one or more psychological or “aptitude” 
tests and a criterion of job performance. 
Not only are the validity coefficients re- 
ported too low to be generally useful, 
but most of the criteria were derived 
from data unrelated to the subject's abil- 
ity to make correct decisions on the job. 
Four published work-sample experi- 
ments (1, 6, 1g, 20) and two unpublished 
studies (g, 11) did report specific ability 
factors related to job performance (such 
as near-point visual acuity) and/or per- 
tinent job factors (such as lighting). 
However, the relationships reported 
were not high enough nor were the 
studies in sufficient agreement with re- 
spect to their specific conclusions to be 
applicable to jobs other than the ones 
studied. 


1 


MARTHA LITTLETON 


The reader in search of information 
that he can use to improve the inspec- 
tion job that he is concerned about finds 
all this interesting, but somewhat frus- 
trating. If he makes a critical analysis, he 
notes what may be a significant omission 
in all of the studies. None of them at- 
tempted either description or analysis of 
the specification’ controlling the inspec- 
tion process, so there is no point of de- 
parture for comparison of the results of 
the separate investigations. Individual 
differences in the subjects’ knowledge of 
the specification were unknown; hence 
that factor had to be treated as an un- 
controlled variable. One study assumed 
that this job knowledge may depend on 
experience, but little relationship be- 
tween experience and performance was 
found (20). 

It may then appear to the person in 
search of usable information that if in- 
spection could be studied by a method 
focused sharply on the actual judgments 
involved, with differences in job knowl- 
edge either controlled or eliminated as 
a variable, he might find his data indica- 
tive of the basic inspection function and 
his conclusions applicable to different 
inspection jobs. His problem then be- 
comes one of experimental design. 

A work sample designed by the 
method of paired comparisons*® appeared 

* The formal statement or description of qual- 
ity is called the specification, It may be stated 
in terms of measurable dimensions or tolerances, 
describe the appearance of a minimally accept- 
able unit of product, or be a sample of the mini- 
mally acceptable product, Knowledge of the speci- 
fication, then, is the equivalent of job knowledge. 


* Also called the “method of variable stimuli,” 
this method was used by early psychologists in 


KELLY 


particularly suited to the approach out- 
lined above. Material from a regular job 
presented in this design would require a 
decision by the subject only as to which is 
the better piece in every pair that is pre- 
sented for judgment. The data would be 
dependent on the judgments required 
by the job, but would not depend on 
knowledge of the specification. Inclusion 
of both good and reject pieces in the 
sample would give data that could be 
scored for accuracy. Analysis of the de- 
gree of differentiation made between 
pieces within the sample, together with 
the subjects’ accuracy in judging good 
pieces as better than reject pieces, should 
indicate the adequacy of the present 
specification or be a guide to its revision. 
That is the background of the present 
study. 

Inspection for appearance of televi- 
sion face plates or panels, a job familiar 
to the experimenter, was selected for 
study. The glass panels and job informa- 
tion were supplied by Corning Glass 
Works, Corning, New York. The specific 
methodology was worked out in the lab- 
oratory at Purdue University, and the 
investigation was completed at Corn- 
ing’s tube plant at Albion, Michigan. 


research on discrimination and judgment. A num- 
ber of stimuli (their real value may be unknown) 
are presented in pairs, each stimulus paired with 
every other stimulus, for a judgment as to which 
of the pair seems larger, smaller, more pleasing, 
etc. to the subject. Each stimulus is given a 
relative value or rank based on the proportion 
of times it is preferred. This is an effective tech- 
nique for measuring esthetic judgments, and it 
seemed suited to appearance inspection where 
the inspector decides simply on a basis of “how 
it looks to him.” 


THE PILOT STUDY 


Since there seemed to be no precedent 
in the literature for a study of inspec- 
tion by the method of paired compari- 


sons, the need to develop the design 
empirically was obvious. It was equally 
evident that a frankly experimental ap- 


INDUSTRIAL INSPECTION 


proach would not be suitable for an in- 
dustrial setting, so the pilot study was 
designed for the laboratory at Purdue 
University. 

The preliminary investigation was 
planned to answer these questions: 
When an individual subject is presented 
a series of pairs of a unit of product, can 
he distinguish between the two pieces 
of a pair sufficiently to decide which is 
the better one? Will his judgments on 
the separate pairs reflect a consistent 
basis of distinction such that a summa- 
tion of the times he prefers each piece 
will result in a ranking of all the pieces 
in the order of his preference? Will he 
make a similar ranking on a second 
trial? Will a group of subjects agree 
well enough on their rankings so that 
the mean rank per piece can be employed 
as a quality scale for the sample? Will a 
scale developed in this way be reliable? 


Tue First MetrHop 
Procedure 


The laboratory was set up to duplicate as 
closely as possible the actual workplace of the 
television panel inspector in the factory. A bench 
the height and width of the conveyor belt, to- 
gether with a model of the inspector's booth to 
be rolled along the bench, was constructed from 
drawings supplied by the company. Two work 
samples of 10 panels each were selected by the 
experimenter from a 200-piece lot of 1214-inch 
round panels preinspected at the factory. Sample 
I consisted of 10 panels called good by the plant 
inspectors, but showing slight defects judged on 
appearance* only, Sample II consisted of 10 panels 
classified as rejects, showing the same types of 
defect. The design of presentation met the re- 
quirements of the paired-comparison method for 
random order of pairs and balanced placement 
of each piece on the right or left of the pair, 
within limits set by the size of the panels. The 
10 panels were placed in order along the bench 
and the subject was instructed to judge the pairs 
1-2, 3-4, etc. As he finished g-10, the first panel 
was removed and he returned to the head of the 


*The specification for these defects is con- 
tained in a minimally acceptable panel, called a 
limit sample. 


bench to judge 2-3, 4-5, etc. As soon as he 
finished the pair 8-9, all pieces were removed and 
presented in an entirely new order, He judged 
them in pairs as before, making two trips along 
the bench to make nine comparisons, Four major 
shifts were required for the complete design. 
Subjects were 11 staff members and graduate 
students in industrial psychology at Purdue Uni- 
versity. Each subject made two trials on each 
sample, giving 45 judgments per sample or 180 
judgments in all. Standard instructions to “ex- 
amine each pair of panels and indicate which 
one would give a better picture in a television 
set” were read by the experimenter, who also 
presented the material and recorded the deci- 
sions. The number of times each subject pre- 
ferred each panel was tabulated to get individual 
rankings or scale values per panel. Reliability of 
the individual rankings on each sample for two 
trials was calculated. Agreement among the sub- 
jects was checked by correlating the mean scale 
values assigned by each subject with the mean 
scale values assigned by all other subjects. Scale 
values assigned by individual subjects were aver- 
aged to derive mean scale values for each sample 
for the first trial, then for the second, and the 
reliability of the sample was computed by corre- 
lating the mean scale values of the two trials, 


Results 


The results at this stage were encour- 
aging. After they had examined the first 
two or three pairs, the subjects ex- 
pressed little difficulty in reaching a de- 
cision. On the reasoning that if each 
subject were deciding merely on a 
chance basis every panel would tend to 
receive the same number of “votes,” 
whereas if all his choices were clear cut, 
each panel would be given a different 
number of votes and the series would be 
ranked 1 to 10, the number of ranks ob- 
tained from each subject’s preferences 
was noted (see Table 1). 

The reader will recall that in the 
paired-comparison design, each piece is 
paired with every other one. In the 10- 
piece sample, each piece appears in nine 
pairs—is judged nine times. A piece con- 
sistently preferred to all the others re- 
ceives nine votes. A piece preferred to 
all but one will get eight, and so on, the 


MARTHA LITTLETON KELLY 


TABLE 1 
Noumper or RANKS RESULTING FROM EACH SuBsecT’S PREFERENCES 


(Sampces I AND IT) AND oF RANKS, Stupy) 


Number of Ranks 


Subject 


(N11) Sample I 


Trial 1 


Trial 2 


Trial 1 


Reliability, Two Trials 
Sample II 


Trial 2 (rho) 


Coo 0 OC OO 


Mean, two trials 9 
Median, two trials 7 


least preferred piece getting no votes for 
a score of zero. The range from nine to 
zerg gives a maximum of 10 ranks. If 
there is no consistency in the judgments, 
each piece is as likely to be rejected as 
it is to be preferred. In this case, every 
piece would receive four or five votes, 
and there would be a spread of only two 
ranks. 

In the pilot study, the minimum num- 
ber of ranks was 6, the mean number 
being 7.9 ranks for Sample I and 8.7 
ranks for Sample Il. The median num- 
ber of ranks was 8.7 for Sample I and 
9.2 for Sample Il. This indicates that 
each subject set up some basis for his 
decisions which differed considerably 
from chance. 

All but one of the 22 rank-order reli- 
ability coefficients calculated for the in- 
divudual scale values (or average ranks) 
were significantly higher than zero at 
the .o1 level of confidence (see Table 1). 
For the two trials, they ranged from .76 
to .g5, averaging .88, by Fisher's z’ con- 


on 


8.7 
9.2 


* Using paired scores of individual subjects, Trial 1 and Trial 2. 


version for Sample I. On Sample II they 
ranged from .47 to .g4, averaging .81. 
(The Pearsonian r for all subjects to- 
gether was .80 for Sample I and .76 for 
Sample II.) Rank-order correlation of 
the scale values assigned by individual 
subjects with the mean scale values as- 
signed by all other subjects ranged from 
54 to .g7, averaging .g3 for Sample I, 
88 for Sample II, and .go for both 
samples. Reliability of the mean scale 
values assigned by all subjects, shown in 
Table 2, was .g6 for Sample I and .98 
for Sample II. Although an exact test of 
the significance of the difference be- 
tween these statistics is not available, the 
two samples appear to have similar reli- 
ability. 


Revisep MetrHop 
Procedure 
These data indicated satisfactory an- 
swers to the preliminary questions. 
However, the time required for admin- 
istration was excessive. Accordingly, the 


4 
Ts 9 85 
Ga 7 .76 .76 
Ti 10 .B9 
Ma 10 I -92 89 
As 10 95 94 
Ac 10 87 .78 
Wi 9 . Bo -47 
Ni 9 -74 
Co 10 .78 
McG 10 .76 .78 
Fa 10 .go .go 
.76 
i 


INDUSTRIAL INSPECTION 


TABLE 2 
MEAN ScALE VALUES (NUMBER OF First CHOICES PER PANEL) AND RELIABILITY 


oF MEAN SCALE VALUES, SAMPLES I AND II, PuLor Srupy 


Panel No. 


5 6 


Both trials 


Sample II 
Trial x 
Trial 2 
Both trials 


order was “streamlined” to minimize 
handling of the panels, with some sacri- 
fice of the precise control of time and 
place association characteristic of the 
first order of presentation. In this order, 
the subject judged the pairs 1-2, 2-3, 
3-4, etc., giving nine judgments on a 
single trip down the bench. As he fin- 
ished the pair in positions g-10, the 
panels in the odd-numbered positions 
were shoved to the rear of the bench. 
Those which had been in the even-num- 
bered positions were shoved up, in or- 
der, to positions 1-5, while those which 
had been in the odd-numbered positions 
were shoved down the bench to posi- 
tions 6-10. The subject compared adja- 
cent pieces as before. The panels were 
again shifted as described above. Four 
such shifts gave the required 45, pairs. 

A new group of 17 students and staff 
members repeated the experiment in 
the new design. 

A further test of the stability of the 
scale values was made by combining five 
panels from each sample into a third 
work sample which was presented to the 
new group of subjects. 


Results 


The mean scale values assigned by the 


second group of subjects correlated with 
those assigned by the first group at .97 
for Sample I and .g5 for Sample I, indi- 
cating that they were not affected by the 
change in design. These data are shown 
in Table 4. 

When five panels from Sample I were 
combined with five from Sample II to 
make Sample III, it was not expected 


TABLE 3 
MEAN SCALE VALUES 
Seconp Group or Susjyecrs, Stupy 


Panel No. Sample I Sample III 


6.97 
5.06 


| 


Sample II 


| 


5 
Sample I 
Trial 1 6.9 6.5 4.9 3.0 1.6 5.8 1.8 6.2 2.1 6.2 
Trial 2 7.8 7.0 4.9 3-4 1.3 5.8 1.7 5.5 2.5 5.6 
as 7.2 6.8 4.9 3.2 1.4 5.8 1.8 5.9 2.3 5.9 
4-4 3-9 5.9 6.0 3.0 2.8 8.2 5.5 
4.9 3.6 5.8 1.2 4-4 6.5 3.2 1.9 7.4 6.0 
4-7 3.8 5.8 9 4-5 6.3 3.1 2.3 7.8 5.8 
|| 
00 
SI 
50 
00 
77 6.53 
73 
58 
31 3.00 
22 6.18 
.08 
58 
50 4.24 
42 36 
67 2.76 
17 
58 1.46 
42 
42 8.03 
ons 


6 MARTHA LITTLETON KELLY 


that the scale values of individual panels 
would remain the same. But an order of 
preference for each set of five similar to 
their order in the original samples 
would indicate that distinctions actually 
were being made on the basis of pairs 
rather than being determined by the 
over-all composition of the sample. Both 
sets of five were ranked in the same or- 
der in the new sample. Moreover, the 
amount of difference in scale value be- 
tween adjacent panels remained remark- 
ably similar, as shown in Table 3 and 
Fig. 1. 

These data indicated that untrained 
subjects, when comparing two pieces of 
product according to the method of the 
experiment, were able to make consist- 
ent distinctions between the pieces. The 
preference scale developed from their 
judgments was reliable and consistent 
for a second group of subjects. The pref- 
erence rank of groups of panels re- 
mained the same when these groups 


were incorporated in another sample. 


Fic. 1. Original seale values of 5 panels each 
from Samples I and II, with new scale values 
assigned them in Sample II. (Number in circle 
indicates panel number in Sample IL.) 


THE FACTORY STUDY 


Presumably, industrial subjects have 
been trained to judge units of product 
as good or bad, the basis of their judg- 
ments being defined by the specification. 
If they were required to develop their 
judgments from the material inspected, 
as the laboratory subjects had done, 
would their decisions result in a similar 
ranking of the pieces in the sample, or 
would they rank all good pieces as equal 
and give equivalent ranks to all rejects? 
The factory study was set up to answer 
this question as well as to get informa- 
tion on the accuracy of performance in 
distinguishing acceptable from reject 
units of product. 

It was expected that the panels scaled 


in the laboratory would be used, but by 
the time the pilot study had been com- 
pleted, the 1214-inch panel was obsolete. 
Consequently, a new sample from cur- 
rent production was developed in the 
plant with the assistance of the Cus- 
tomer Inspector.’ Otherwise, the meth- 
odology was identical to that of the pilot 
study. 


* This man is the company’s final authority on 
inspection of individual items. A member of the 
Sales Department, he works with the product 
engineers of the company and customer when the 
specifications are set up. When a customer com- 
plains of items in a shipment that do not meet 
the “specs,” he inspects all contested pieces at 
the customer's factory. His judgment is accepted 
as final and credits or replacements are made 
only at his direction. 


Scale Veive Seale Velve Seale Velve 
Semple Semple Semple tt 
1. (8) 
yo 
@ 
ae 
) 
2. (3) 
| 


PROCEDURE 
The Work Sample 


The sample consisted of ten 20-inch 
rectangular television panels selected by 
the Customer Inspector from a pool of 
approximately 40 set aside from current 
production by the Quality Control De- 
partment. Only panels containing de- 
fects of appearance (compared to an 
official limit sample*) were included. 
The inspector tried several assortments, 
the final one yielding a rank-order reli- 
ability of .g4 (r = .go) for two trials. It 
included four rejects and six good 
panels, one of which was the equivalent 
of the limit sample for the defects repre- 
sented. 


The Paired-Comparison Trials 


The experiment was set up near the 
production line, using the regular 
equipment for special inspection jobs— 
a long workbench and a portable inspec- 
tor’s light which could be rolled along 
it. The sample was presented in the 
“streamlined” order to 12 quality con- 
trol and 2 line inspectors. They were in- 
structed only to “look at a pair of panels 
and then tell the experimenter which of 
the pair is better.” If they asked ques- 
tions, they were told not to worry about 
the specifications, simply to decide on 
the two panels being considered. Two 
supervisors also participated. 


The Job-Practice Run 


Following completion of the paired 
comparisons, each subject was asked to 
inspect the 10 panels individually just 
as he would on the job. His decision to 
pass, reject, or send each panel for re- 
claiming was recorded. The following 


* See footnote 4. 


INSPECTION 7 


description of job method seems perti- 
nent here. 


Inspection method, Most inspectors work on 
either side of a slowly moving conveyor belt 
which carries the panels or completed bulbs out 
of the controlled cooling oven or lehr. The in- 
spector places the good pieces in cartons and 
tosses the rejects into a hopper. It is common, 
but not invariable, practice to require each in- 
spector to keep a record of the pieces rejected, 
noting the defect for which each has been dis- 
carded. 

The line inspectors are usually new hires, while 
the quality control inspectors are promoted from 
the line jobs, The supervisor's recommendation 
is the basis for promotion, though seniority im- 
poses some restrictions. 

New inspectors are given brief instruction by 
the supervisor. This consists of a demonstration 
of the defects to be identified (some 18 different 
ones are to be judged for severity), and some 
instruction in how the panels can be handled, 
though there is no prescribed method, They are 
shown a copy of the written specification, which 
is in a book near the workplace. Since the speci- 
fication is written in engineering terms, its main 
function for both workers and supervisors is to 
provide the names of the defects. The specifica- 
tion may describe defects of appearance, and 
they are also represented by limit samples. 

The official limit samples are the sole property 
of the quality control department and are usually 
kept under lock and key. At the time of the 
experiment a cabinet of limit-sample equivalents 
at the far end of the work area was available to 
the supervisor. 

Ordinarily, new workers are placed near more 
experienced inspectors whom they probably con- 
sult when in doubt, thus eliminating some of 
the follow-up essential to good training, They are 
instructed to confine their questions to the super- 
visor, He decides on the spot, consults the limit 
samples, or someone from quality control. Em- 
phasis is always on identification of the defect. 
There is some reinspection of pieces passed, little 
or none of rejects. Several systems of identifying 
the work of individual inspectors have been tried 
without success, so there are no real records of 
inspector competence. 

Most of the time, the typical inspector reaches 
a decision on his own, without reference to fellow 
workers, the supervisor, or the specification, 
Speed of handling and a manner of decisiveness 
are commonly considered characteristic of a good 
inspector. 

Aside from the fact that there was no opportu- 
nity for consultation in cases where the inspector 
was in doubt, performance on the job-practice 
run appeared to be typical of work on the job. 


MARTHA LITTLETON KELLY 


TABLE 4 


Panel No. 


14 inspectors 


4 Management men 7.2 


“Exact” values, 
Customer Inspector 7 


RESULTS 
The Paired-Comparison Decisions 
The ability of the industrial inspec- 


tors to distinguish between panels was. 


estimated by noting the number of ranks 
resulting from the paired-comparison 
preferences, and by calculating indivi- 
dual and mean scale values for the 
panels as in the pilot study. Summary 
data are given in Table 4. The minimum 
number of ranks obtained from an indi- 
vidual subject was 8, with a mean of 
g.1 ranks and a median of 9.6 ranks. Had 
the inspectors simply been sorting the 
good from the reject panels, there would 
have resulted—theoretically, at least— 
only two ranks, one for good panels, one 
for rejects. 

More precise evidence on this point 
can be drawn from an examination of 
Table 5. This gives the critical ratios 
for the differences between mean scale 
values for all 45 pairs. 

Section A of Table 5 shows that in the 20 
pairs where a reject piece appeared with a good 
one, the critical ratios were all significant at the 
o1 level of confidence, All differences for the 
four pairs where a reject appeared with the limit 
sample were also significant at the .o1 level, as 
shown in Section B. Section C shows that in six 
cases where reject panels were paired with each 
other, the critical ratios ranged in significance 
from .o1 to 60, In Section D appear the five 
cases where a good panel was paired with the 
limit sample: here the ratio was significant at 
the .o1 level in two cases, at the .10 level in one 
case, and at the .go level in the remaining two 


cases. Only when the good panels appeared with 
each other did the significance of the differences 
fall near chance expectancy, ranging from the .10 
level to no difference (see Section E). For all 45 
pairs the differentiation is far above chance ex- 
pectancy, Hence it seems that our industrial sub- 
jects did considerably more than merely dichoto- 
mize the sample, 


The reliability of the mean scale - 
values in this sample compared favor- 
ably with that of the samples in the 
pilot study. The rank-order coefficient 
was .g4 (r = .go) for the Customer In- 
spector on two trials. Split-half reliabil- 
ity for random halves of the inspector 
group was rho = .86, r = .go, which be- 
come .g2 and .g8, respectively, when 
stepped up by the Spearman-Brown 
formula. When the inspector group was 
divided into matched halves, representing 
the same assortment of jobs in each half, 
the reliability figures were rho = .93, 1 
= .975, which become .g6 and .gg when 
stepped up. 

Accuracy of the inspectors’ decisions— 
their ability to differentiate the good 
from the reject pieces in the sample— 
may be estimated in several ways. Prob- 
ably the most definitive evidence is given 
in Table 5 which shows that all reject 
panels were distinguished from the good 
ones and from the limit sample at the .o1 
level of confidence by the group as a 
whole. On this basis, accuracy of the 
group is 100%. 


i 
— 
8 
MEAN SCALE VALUES AssiGNep New (Factory Stupy) 
Group 
I 2 3 4 5 6 7 8 9 10 
6.3: 2.6 6.6 6.1 5-4 2.3 °.9 6.6 
| 1.2 6.0 5.8 4-5 3.0 8.2 1.5 6.2 1.2 


INDUSTRIAL INSPECTION 


TABLE 5 
DIFFERENTIATION BETWEEN PANELS, Factory Stupy 


(Critical ratios for difference between mean scale values assigned by 14 inspectors.) 


Pair CR p 


Pair CR p 


A. Reject vs. Good Panels 


C. Reject vs. Reject Panels 


B. Reject vs. Limit Sample 


7-38 
10.22 
7-45 


D. Good vs. Limit Sample 


1.69 
2.92 
1.49 
2.90 
2.06 
E. Good vs. Good Panels 


On ew 


| 
i 


The accuracy of individual inspectors 
may be estimated by examining their 
individual records, scoring as an error 
all instances where a reject is given a 
scale value higher than or equal to the 
lowest value assigned to a good panel 
or to the limit sample. By this definition, 
10 inspectors had no errors, or were 
100% accurate. The remaining four had 
one error apiece, or were go%, accurate. 
Average accuracy of the group was 97%. 

Because it would be simpler to incor- 
porate a comparison of each piece being 
inspected with the limit sample than to 
apply the principle of the full design to 
the present inspection method, the “ac- 
curacy” of decisions on the nine pairs 
per trial when each panel appeared with 
the limit sample was checked for each 


| 


subject. The decision was scored as cor- 
rect when the limit was preferred to a 
reject, and when the good panels (which 
had been differentiated from the limit 
sample significantly) were preferred to 
the limit sample. On that basis, accu- 
racy was 88.5% on the good panels, 91% 
on the reject panels, and go%, on the 
full sample. These data are given in 
Table 6. 

The coefficient of correlation may also 
be used as an index of accuracy since the 
scale values assigned by the Customer 
Inspector are 100%, accurate by defini- 
tion. They correlated with the mean 
scale values assigned by the inspectors, 
rho = .87, r = .o1. Mean scale values as- 
signed by him and the supervisors may 
be considered as an additional criterion, 


9 
2-1 6.49 -o1 2-6 0.638 .60 
2-3 8.51 2-8 3.40 
2-4 6.73 2-10 2.20 
2-7 7.07 6-8 4.24 .O1 
2-9 6.56 6-10 2.10 
6-1 8.88 8-10 0.81 .50 
6-4 10.00 
6-7 11.13 .o1 
6-9 8.60 . I-5 .20 
8-1 11.73 3-5 
8-3 17.81 .O1 4-5 .20 
8-4 13.00 7-5 
8-7 11.07 o-5 .10 
8.77 
10-4 9.23 68 .60 
10-7 9.08 4 .70 
10-9 7.61 43 .20 
5 
o8 .30 
2-5 00 
56 -10 
8-5 gt 
10-5 95 .40 
4 


MARTHA LITTLETON KELLY 


TABLE 6 
Numeper or Correct Decisions, 14 INSPECTORS 


Job-Practice Run 


Comparison with Limit 


Reject 


Good** Reject Total 


Us OFS SSH H 


» 


WO CWO CWO SO WO 


al 
Ba 


* N=6, includes Limit. 
** News, Limit excluded. 


and their correlation with the inspec- 
tors’ scale values was rho = .88, r = .g5. 

Of particular interest to the manage- 
ment was the possibility that inspectors 
would differ in their rankings, the dif- 
ferences being attributable to their 
regular inspection jobs. Rank-order cor- 
relations of the mean scale values as- 
signed by job groups with those assigned 
by all others were: final bulb inspectors, 
.95; production check inspectors, .95; 
production parts inspectors, .93; and 
process inspectors, Blow Room, .g2. No 
significant differences attributable to the 
type of inspection usually performed 
were noted. Correlations between other 
subgroups based on sex and length of 
service, calculated in the same manner, 
were similar, showing no differences at- 
tributable to these factors. 


The Job-Practice Decisions 


Individual inspectors scored from 2 to 
the maximum possible 10 correct deci- 
sions on the job-practice run, ranging 


from 20%, to 100% accuracy, and aver- 
aging 51%. These data are also in Table 
6. 

The whole group achieved an accu- 
racy of 65% on the good panels, 29% on 
the rejects, and 51% on the whole sam- 
ple. Since the Customer Inspector had 
selected only panels showing defects 
which could not be improved by re- 
claiming, the large number of decisions 
to send a panel to be reclaimed were all 
scored as errors. Had a decision to pass 
or reject been forced in each case, pre- 
sumably half the guesses would have 
been correct. With this allowance ac- 
curacy becomes 81% for good panels, 
60% for rejects, and 73% for the whole 
sample. 


SIGNIFICANCE OF THE Factory Stupy 
Reliability of the Paired-Comparison 
Judgments 
The factory study substantiated the 
findings of the laboratory investigation, 
showing comparable reliability, and in- 


| 
10 
1 2 
2 5 
3 4 
9 4 
10 4 
5 3 
6 10 
% 6 
14 3 
15 4 
7 6 
9 
12 5 
13 6 
% 65 5! 


INDUSTRIAL 


dicating that the method could be used 
to develop a reliable quality scale for 
defects of appearance. 


Validity of the Paired-Comparison Judg- 
ments 


The scale values developed by the in- 
dustrial inspectors showed a satisfactory 
validity when correlated with the values 
assigned by the Customer Inspector. 
Further evidence of their validity may 
be obtained by correlating the mean 
scale values with the number of the in- 
spectors’ decisions to pass, reject, or re- 
claim each panel on the job-practice 
run. These rank-order correlations were 
.94 between scale value and number of 
decisions to pass. the panel, —.81 be- 
tween scale value and number of deci- 
sions to reject it, and —.71 between scale 
value and number of decisions to re- 
claim it. 


Accuracy of 
Judgments 


the Paired-Comparison 


The outstanding result of the factory 
study seemed to be the superior accuracy 
of the paired-comparison judgments 
over those given on the job-practice run. 
The data are summarized below. 

The mean scale values differentiated 
all good from all rejects at the .o1 level 
of confidence—accuracy 100%. Indivi- 
dual inspectors assigned higher values to 
all good panels than to any reject in 136 
of a possible 140 instances—average ac- 
curacy 97%. In the nine judgments per 
inspector when the limit sample was 
compared with each of the other pieces, 
the inspectors were go% accurate. In 
judging the panels according to job 
practice, they were only 51% accurate. 

These data were derived from the full 
complement of the quality control de- 


INSPECTION 
partment for one shift. Projection of the 
loss of product attributable to inspector 
error to the full shift would indicate a 
loss of hundreds of pieces per day. Hence 
a sizable investment in changing inspec- 
tion method to improve accuracy should 
pay for itself in recovery of lost product. 

According to the data cited above, if 
the inspection procedure were changed 
to the full paired-comparison design, a 
go% improvement in accuracy can be 
predicted. This would be practical only 
in small-lot sampling inspection. But if 
the method were changed to require 
that all pieces be compared with a limit 
sample instead of with the subjective or 
“memory image” standard currently 
employed, accuracy might be increased 
by 76%. It seemed that this change 
could be made easily, even in 100%, in- 
spection. 


A Debatable Point 


Before recommending this change to 
management, it seemed wise to check 
one questionable point with further ex- 
perimentation. Data on the accuracy of 
the comparisons with the limit sample 
were drawn from the responses made in 
the paired-comparison trials. Though it 
seemed reasonable to expect that similar 
decisions would be given if each piece 
were compared only with the limit 
sample, it was also possible that the 
subjects’ decisions were favorably condi- 
tioned by the other 36 pairs in the 
sample. Accordingly, a plan to repeat 
only the comparisons with the limit 
sample was drawn up and the company 
so advised. 


CHANGED Facrory CONDITIONS 


The local management reported that 
soon after the experiment had been car- 


12 MARTHA LITTLETON KELLY 


ried out, a slump in the television indus- 
try had necessitated a six-month shut- 
down on the go-inch television bulb. 
When production was resumed, not only 
had there been a 100% turnover in the 
inspection force and the work sample 
been lost despite precautions, but in- 
spection methods had been changed con- 
siderably. 

At that time the local management 
had not received any information from 
the experimenter, except a casual com- 
ment in a social conversation, that the 
paired-comparison decisions were ap- 
parently more accurate than those made 
on the job-practice trials, The Quality 
Control Director reported that “watch- 
ing the experiment gave them some 
ideas which they incorporated in the 
inspection job when they had to set it 
up again for new people.” He described 
the new method as follows: 

Revised inspection method. The first 
step in preparing for resumption of pro- 
duction was the selection of a set of 
limit sample equivalents for each work- 
place. These were selected by the pooled 
judgment of the Quality Control Dir- 
ector, several of his experienced super- 
visors, and the Customer Inspector. The 
samples are stored near the workplace, 
and at the beginning of each shift the 
supervisor selects those samples which 
represent the particular defects appear- 
ing in current production and _ places 
them on a rack built over the conveyor 
belt by which the inspectors work. The 
limit then appears in the same physical 
relationship to the piece under inspec- 
tion as it did in the experiment. (There 
is no change in the number of specific 
defects listed in the formal specification, 
but defects tend to occur in families, de- 
pendent on whether they are attribut- 
able to machine operation, temperature, 


quality of raw material, etc. Hence only 
two or three limit samples would be 
needed at a time.) The formal specifica- 
tion was simplified, illustrated with dia- 
grams, and posted at the workplace. 

Inspectors are hired or upgraded as 
before, and it is still the supervisor's 
responsibility to break them in. He usu- 
ally begins his instruction by referring 
to the posted specification, but concen- 
trates on the technique of comparing 
each piece with the limit sample which 
has been installed. The gist of his in- 
struction is that pieces that are inferior 
to the sample are to be put in the hop- 
per, while those that are as good as it is 
or better are to be placed elsewhere for 
transportation to final assembly or pack- 
ing. The inspector still knows that he is 
separating good from reject pieces, but 
the emphasis has now been placed on 
the specific comparisons. The recording 
and reinspection systems are about as 
before. The supervior spends about the 
same amount of time in instruction, but 
he feels that the inspectors learn the job 
more quickly and maintain a more con- 
sistent performance than before. 

It was obvious that job method had 
changed so much that further experi- 
mentation would not give results com- 
parable to those already obtained. There- 
fore, information indicative of inspec- 
tion efficiency under the two methods 
was requested from the regular produc- 
tion records. The experimenter sug- 
gested two items which might be per- 
tinent: (a) The number of rejections at 
final inspection should indicate errors 
made by inspectors performing first or 
parts inspection; and (b) the accuracy 
of final inspectors should be indicated 
by the number of customer complaints 
(see footnote 5), since the pieces cus- 
tomers complain about are the ones for 


INDUSTRIAL INSPECTION 


TABLE 7 


PRODUCTION RECORDS ON THE 20-INCH 
TELEVISION BULB 


Period B, 
Revised 
Method 


Period A, 
First 
Method 


Item 


Total panels produced 
by machine 

No. rejects at first 
inspection 

No. panels sent to 
assembly 

No. rejects at final 
inspection 

No. shipped to customer 

No. rejects by customer 


213,828 
71,892 
141,936 


219,108 

go, 618 
128,490 
1,728 


16,715 
279 


1,250 
33,°59 
130 


which the company actually makes re- 


placements or refunds. 


Accuracy of Inspection by the Two 


Methods 


Production records on the 20-inch 
rectangular television bulb were re- 
viewed for the six months preceding 
the shutdown (Period A) and for the 
six months immediately after produc- 


tion was resumed (Period B). The plant 


management supplied the data in 


Tables 7 and 8. 


13 


The critical ratios given in Table 8 
for the differences in percentage of in- 
spection losses by the two methods are 
all statistically significant at better than 
the .oo0001 level of confidence accord- 
ing to standard tables. 

It will be noted that the machine 
efficiency was apparently better during 
Period A, a fact which refutes the pos- 
sibility that the decreased losses due 
to inspection at later stages of manu- 
facture can be attributed simply to 
better quality of production by the 
machine, Management men who assisted 
in the study report that demand was 
considerably higher daring Period A 
than during Period B. During Period B 
the television set manufacturers were 
just recovering fram a severe cutback 
themselves. As a result, their orders were 
smaller and their quality specifications 
higher. Therefore, the change in the per- 
centage of customer rejects cannot be 
attributed to relaxing of the specifica- 
tion. Hence it seems reasonable to as- 
cribe the improvement to the change in 
the method of inspection. 


TABLE 8 


PERCENTAGE COMPARISONS FOR INSPECTION Losses IN Two PERIODS 
or Propuction (A anp B) 


% Rejects 


Difference % Improve- 


A B 


ment 


A minus B B over A 


First inspection 
Final inspection 
Customer rejects 


33.62 
1.217 
1.67 


41.36 


0.39 


©.973 


—7. —23.0° 
20.0 


1. 76.8 


* Indicates difference is in favor of Period A. 


SUMMARY AND CONCLUSIONS 


The results obtained in this study 
support the hypothesis stated in the In- 
troduction that a methodology which 
is focused sharply on the actual judg- 
ments involved in an appearance in- 


spection job might give data indicative 
of the basic inspection function and sug- 
gest conclusions which might be genera- 
lized to other inspection jobs. 


14 MARTHA LITTLETON KELLY 


Tue Meruop or COMPARISONS 


As an experimental technique, the 
method of paired comparisons seems to 
have the following advantages in a study 
of an industrial inspection job. 

High Reliability 

Reliability of the paired-comparison 
ranks of the work samples used in the 
study exceeded .go for industrial sub- 
jects and two groups of untrained sub- 
jects. This is a considerable improve- 
ment over the reliability reported for 
most work-sample experiments in the 
literature, and a marked increase over 
the reliability of the criteria in the 
studies attempting to validate psycho- 
logical tests. Not only were the scale 
values (or average ranks) consistent, but 
the relative rank assigned to part of a 
sample remained similar when that part 
appeared in a different sample. Reli- 
ability coefficients obtained in this study 
compare favorably with those reported 
for other uses of the method (12). 


High Accuracy of Comparative Judg- 

ments 

Accuracy of the decisions obtained in 
the paired comparisons exceeded that 
obtained by the traditional method in 
the job-practice run by at least go%,. 
Individual comparisons with the limit 
sample’ were 76% more accurate than 
the job-practice decisions. The accuracy 
figures of 97% and go%, respectively, 
are considerably higher than those re- 
ported for work-sample studies using 
other methods (g, 11, 13, 20). 


Validity and Applicability of the Data 


Validity coefficients of .g1 and .g5 for 
the mean scale values assigned by the 
inspectors with those assigned by the 


"See footnote 4. 


Customer Inspector and management, 
respectively, indicate that judgments ob- 
tained by the paired-comparison method 
are actually valid. Also, the mean scale 
values assigned by the inspectors cor- 
related .g4 with the number of their 
decisions to pass each panel on the job- 
practice run. 

The major finding of this study is that 
specific or comparative judgments are 
more uniform and accurate than the 
type usually made on the job. The ap- 
plication of this finding by the company 
management in its revised inspection 
method resulted in an improvement of 
76% in accuracy of final inspection. 
This is evidence of the practical appli- 
cability of the findings of the present 
study. 


PossipLe APPLICATIONS 


The efficiency of the inspection proc- 
ess, aS it is usually carried out, rests 
on the assumption that the inspector's 
idea or recollection of the limit sample 
is invariable whether the product is 
running mostly good or mostly bad. 
There is general recognition of the fact 
that quality of product changes fre- 
quently, since sampling inspection is 
customarily performed at go-minute in- 
tervals, but there is little or no aware- 
ness of the effect of variations in product 
quality on inspector efficiency. 

To the writer, the study suggests that, 
despite apparent aptitude, training, and 
experience, skilled industrial inspectors 
make their judgments in the same fash- 
ion as do untrained subjects—influ- 
enced to a considerable degree by the 
characteristics of the material under in- 
spection at the time. The change in 
method suggested by the experiment ap- 
parently corrected this bias to a con- 
siderable extent, and was followed by a 


INDUSTRIAL INSPECTION 5 


marked improvement in accuracy. Since 
the right of management to improve 
efficiency by selection and placement of 
present employees is subject to restric- 
tion or challenge under present industrial 
relations procedures, whereas control of 
work methods remains its prerogative, 
the advantage of data applicable to job 
method rather than limited to place- 
ment is obvious. 


To 100%, Inspection 

The principle of requiring inspector 
judgments to be made against a limit 
sample may be applied to either sampling 
or to 100%, inspection. Its application 
should result in a substantial increase in 
inspector accuracy. Writers, supervisors, 
and inspectors themselves agree that in 
the usual job situation the inspector's 
memory of the specification is continu- 
ally being affected by the general quality 
level of the product he happens to be 
handling at the time. The presence of 
an invariable standard should eliminate 
or reduce this bias. In the event that 
the specification is changed, substitution 
of a new limit sample would change in- 
spection judgments automatically, mak- 
ing retraining of inspectors unnecessary. 


To Selection of Limit Samples 


The method of the experiment should 
be useful as a scientific check on the 
adequacy of the specification or the 
limit sample which represents it. Fre- 
quently the description of a minimally 
acceptable unit of product is the result of 
considerable negotiation between prod- 
uct engineers during the writing of a 
sales contract and has little reference to 
the inspectors’ ability to make the dis- 
tinction in the degree of defect called 
for. Scaling several samples from normal 
production should indicate that point 
in the graded series where consistent 


distinctions can be made. Unless the 
specification describes a unit near the 
line of practical demarcation, some re- 
vision is indicated before the limit 
sample will actually function as in- 
tended. 

The efficiency of any particular limit 
sample can be checked by using it in 
several samples—from the same _pro- 
duction run, of course—to determine 
whether it is consistently scaled at the 
lower limit of the good pieces, and sig- 
nificantly above the rejects. 

Usually there is but one limit sample 
per defect designated for an item, while 
the production situation requires that 
inspection be performed at several lo- 
cations at the same time, The design of 
the study could be used to identify other 
units of product which are not distin- 
guishable from the original limit sample 
and these equivalents could be used at 
all inspection stations. 


To Sampling Inspection 

The procedure of this study is directly 
applicable to sampling inspection where 
the size of the sample does not exceed 
ten pieces per half hour. Inclusion of a 
limit sample in the sample lot should 
make possible a quantitative estimate of 
the quality of the total lot. Inspection 
records obtained in this manner should 
be less subject to falsification or error 
than are the present records, 

The method may be used to develop a 
quality scale for degree of defect for 
those defects now judged simply on an 
acceptance basis. Such a scale would 
make possible the use of X and R charts 
in many situations where the volume 
required by the p or percentage defec- 
tive charts prohibits their use in prod- 
uct or process control (7). Such an 
application would be pertinent to the 
product studied in this experiment. 


MARTHA LITTLETON KELLY 


REFERENCES 


. Ayers, A. W. A comparison of certain visual 

factors with the efficiency of textile in- 

spectors, J. appl. Psychol., 1942, 26, 812- 

827. 

. Drake, C. A. Inspectors are born that way. 

Fact, Mgmt Maint., 1940, 98 (4), 44°45, 102, 

104. 

. Drake, C. A, New developments in the selec- 

tion of factory workers, Prod. Ser. Amer. 

Mgmt, Ass., 1940, No. 127, $2-43- 

. Femperc, R., & Coteman, J. H. Vision tests 

for inspectors insure good placement, Fact. 

Mgmt Maint., 1945, 163 (1), 106-110. 

. Tests for the selection of in- 

spector packers, Psychol. Bull., 1941, 38, 735- 

(Abstract) 

. Giese, W, J., & Sacen, H. FE. Ampoule inspec- 

tion, Drug Cosmetic Ind., 1950, 66, 518, 

. Grant, E. L, Statistical quality control, New 

York: McGraw-Hill, 1946. 

. Kernarr, N, C, Visual skills and labor turn- 

over, J. appl. Psychol,, 1948, 32, 51-55. 

Kernarr, N, C., & Wisse, J, Vision inspec- 

tion performance research, Foundation Case 

No. 1020. Unpublished study, Purdue 

Univer., 1949. 

. Kerr, W. A, Vision tests for precision workers 
at R.C.A. Personnel Psychol., 1948, 1, 64-66. 

. Lawsue, C, H, An experimental study of the 
relative efficiency of two methods for in- 
specting ophthalmic lenses, Unpublished 
study, Purdue Univer., 1945. 

. Lawsne, H., Kepnarr, N. C,, & McCormick, 


(Accepted for publication 


FE. j.17 he paired comparison technique for 
rating performance of industrial employees. 
J. appl. Psychol., 1949, 33, 69-77. 


. Lawsne, C. H., Jr., & Tirvin, J., The accuracy 


of precision instrument measurement in 
industrial inspection, J. appl. Psychol., 1945, 
29, 4'3°4'9- 


. McMurry, R. N., & Jonson, D. L. Develop- 


ment of instruments for selecting and plac- 
ing factory employees. Advanced Mgmt, 
1945, 10, 119-120. 


. Maner, H., & Fire, Isapecce E. A biological- 


pharmaceutical checker selection program. 
J. appl. Psychol., 1947, 31, 469-476. 


MANN, IDA, & Arcnipatp, D. A study of a 


selected group of women employed on ex- 
tremely fine work. Brit. Med. J., 1944, 1, 
387-390. 


. Runpqguist, E. A., & Brrrner, R. H. Valida- 


tion of tests for glass bottle inspectors at 
Owens-Illinois Glass Co. cited by Bingham, 
W. E. in Great expectations. Personnel 
Psychol., 1949, 2, 398. 


. Sarrain, A. Q. The use of certain standard- 


ized tests in the selection of inspectors in 
an aircraft factory. J. consult. Psychol, 
1945, 234°235- 


. SHUMAN, J. T., The value of aptitude tests 


for factory workers in the aircraft engine 
and propeller industries. J. appl. Psychol., 
1945, 29, 156-160. 


. Tirrin, J., & Rocers, H. B. The selection and 


training of inspectors, Personnel, 1941, 18, 
14-31. 
October 20, 1954) 


16 

| 


